We are privileged to have a distinguished list of invited speakers that span the breadth of themes for the conference: Rosemary Bailey, Adrian Bowman, Richard Emsley, John Hinde, Thomas Lumley, Otso Ovaskainen, Jay Ver Hoef, and David Warton.
Desmond Patterson introduced the design key in 1965 in the context of experiments on crop rotations. It can be used whenever the treatments have factorial structure, the experimental units have a poset block structure, and an orthogonal design is required. The design key gives an algorithm for allocating treatments to experimental units, and another algorithm for identifying which stratum contains which treatment effect. These two properties make it a very useful tool when extended to multi-phase experiments.
Bio: R. A. Bailey is Professor of Mathematics and Statistics at the University of St Andrews. After a doctorate in finite group theory at the University of Oxford, she worked at the Open University for a few years. She embraced statistics while working on restricted randomization and factorial designs as a post-doctoral research fellow at the University of Edinburgh. She spent ten years at Rothamsted Experimental Station designing and analysing agricultural experiments before returning to academia in the University of London. She moved to St Andrews in 2013. She has also held visiting positions and fellowships in France, Australia, New Zealand, the USA and Brazil.
This talk aims to reflect a little on the languages we use to explain, discuss and communicate statistical concepts, models and analysis, both within our own community and beyond it. There will be particular emphasis on how we communicate uncertainty. Several types of analysis will be considered, mostly involving flexible regression and focussed largely on different forms of spatiotemporal data. However, there will be a strong focus on the role of graphics, which can provide a powerful means of conceptual communication and give clear expressions of the insights provided by models, while remaining true to the issues associated with uncertainty.
Bio: Adrian Bowman is a Professor of Statistics in the University of Glasgow. He grew up in the seaside town of Prestwick in Scotland, followed by university education in Glasgow and Cambridge, in Mathematics and then Statistics. Adrian's first academic job was in the University of Manchester but he subsequently moved to Glasgow, where most of his career has developed and where he is now Head of the School of Mathematics & Statistics. He is the joint-author of a book on smoothing techniques and is very active in research, currently on environmental modelling, the analysis of anatomical shapes and of brain images. Adrian also has a longstanding interest in educational technology, particularly in the role of graphics in aiding the understanding of statistical ideas. He is married to Janet, who teaches maths, and they have three grown-up children. A recent fun activity was paddling a sea kayak through the Corryvreckan (you may have to Google that) under strict supervision.
Adrian is supported by the SSAI-WA branch's Frank Hansford-Miller fellowship during his visit.
The idea that a given treatment will have more benefit for some patients than for others is the underlying foundation of what is termed personalised or stratified medicine. If we can identify who these patients are before the decision to give a treatment is made, it will help prevent other patients being exposed to an unnecessary treatment that is likely to be of little or no benefit. As well as the obvious benefits to individual patients, it will also mean that the health care providers can save money on expensive treatment costs by targeting the right treatments to the right patients at the right time. In many clinical scenarios there are hypothesised to be multiple predictive markers that could be combined to target treatments, and often these are of different modalities such as clinical, genetic and imaging markers. In this talk, I will present the underlying concepts of stratified medicine in terms of benefits and harms and describe why the underlying treatment mechanism is fundamental to evaluation of targeted treatments. I will discuss the issue of choosing the appropriate scale of interaction (additive versus multiplicative), methods for evaluating and combining multiple predictive markers, and prospective trial designs which use analysis methods from the causal inference literature.
Bio: Richard is a Senior Lecturer in Biostatistics in the Centre for Biostatistics. He is also Visiting Lecturer in the Department of Biostatistics at the Institute of Psychiatry, Psychology and Neuroscience at King's College London. His research aims to answer three key questions: Are treatments effective? How do they work? Which groups are they most effective for? This involves the development of statistical methods for causal inference, efficacy and mechanisms evaluation and stratified medicine. He is the initiator and Chair of the steering group for the UK Causal Inference Meeting (UK-CIM). He is involved in stratified medicine research programmes in schizophrenia, psoriasis, arthritis and cancer.
Mixture models have now become an important tool in statistical modelling, with the EM algorithm being an important driver in their popularization and applicability. This talk will present the basic ideas and discuss a range of different mixture models and their use in addressing specific applied problems. Examples discussed will include: mixtures of non-linear regression models for fish growth studies; mixtures of smooth curves for clustering time-course microarray data; mixture orthogonal regression models for sugarcane SNP data; regression mixture models for outlier detection/accommodation.
Researchers conducting clinical and epidemiological studies are inevitably faced with problems of drop-out and missing data. Importantly, the participants who drop out of a study are often those with poor health resulting in selection bias in the available data. Multiple imputation is gaining popularity as a strategy for handling missing data, since it enables all participants to be included in the analysis. However it is far from a panacea for dealing with missing data, and a number of important questions remain regarding its practical application.
In this talk I will provide a brief introduction to multiple imputation and will review some of the research surrounding this powerful and versatile approach for handling missing data conducted by our research group. This will include a discussion of when multiple imputation is likely to be beneficial over a complete case analysis, which imputation procedure to use, and the imputation of skewed, limited range and semi-continuous variables. Importantly I will highlight that multiple imputation is not a miracle cure and that it can introduce bias in the estimation of parameters of interest if the imputation model is not appropriate.
Bio: Katherine Lee is a senior biostatistician at the Murdoch Childrens Research Institute. She grew up in Bristol in the UK before moving to Nottingham where she obtained a Bachelor of Science in Mathematics from the University of Nottingham. Following this she obtained a Masters of Science in Medical Statistics from the University of Leicester UK and a PhD in Biostatistics from the University of Cambridge. Katherine spent the first 2 years of her working life at the Medical Research Council Clinical Trials Unit in London, before moving to Melbourne. Katherine has over ten years’ experience in the design, planning and analysis of randomised trials and observational studies, and is now Associate Director (Biostatistics) at the Melbourne Children's Trials Centre at the Royal Childrens Hospital. She is also an Honorary Fellow in the University of Melbourne. Her methodological interest is in multiple imputation for dealing with missing data.
The Hispanic Community Health Study/Study of Latinos (HCHS/SOL) is a US cohort study following a multistage probability sample of approximately 16000 Hispanics/Latinos. One component of HCHS/SOL is a set of large-scale genetic association studies for which the standard analysis would be linear mixed models with terms for ancestry and relatedness as well as shared environment. Currently, methodology is not available for incorporating complex sampling in mixed models except where the sampling structure is nested in the model structure, which is not the case for HCHS/SOL. I will talk about inferential aspects of the modelling, and about computational issues when dealing with hundreds of thousands of genetic variables and the complex sampling and mixed-model correlation structures. This is joint work with Xudong Huang and Alastair Scott.
Bio: Thomas Lumley is Professor of Biostatistics at the University of Auckland. He has an undergraduate degree from Monash, an MSc from Oxford, and a PhD from the University of Washington, where he subsequently spent twelve years on the academic staff. His research interests include statistical computing, semiparametric statistics and its connections with survey sampling, cardiovascular epidemiology, and genomics.
A key aim in community ecology is to understand the factors that determine the identities and abundances of species found at any given locality. Central concepts in this research field include the regional and local species pools, environmental filtering and biotic assembly rules. Typical datasets involve a matrix of presence-absences (or abundances) for a group of species at different sites, some environmental and geographical characteristics of those sites, and possibly information on the ecological traits and phylogenetic relationships of the species. The analysis of such data have been traditionally based on ordination approaches, but there is increasing interest to move to model based approaches, in particular joint species distribution models. I present a joint species distribution model that captures the influences of environmental filtering at the community-level by measuring the amount of variation and covariation in the responses of individual species to various characteristics of their environment. The selection of the local species pool from the regional species pool involves both deterministic (e.g. systematic differences in dispersal abilities) and stochastic (e.g. spatio-temporal randomness in the realized distribution patterns) processes. Biotic assembly rules are reflected in the model with the help of an association matrix, which models positive or negative co-occurrence patterns not explained by the responses of the species to their environment. I use a latent factor approach to enable model parameterization with data on species-rich communities and thus with high-dimensional association matrices. I illustrate the performance of the approach both with simulated and real data.
Bio: Otso Ovaskainen is an internationally renowned researcher at the interface between mathematics, statistics and biology. His work focusses on connecting general theories with empirical research, which has led to important contributions to metapopulation theory and dispersal theory. He has recently been interested in adapting modern statistical tools to ecology, including diffusion-based stochastic processes for the study of animal movement, and latent variable models for community ecology and the study of species interactions. Otso is a Professor in the Department of Biosciences at the University of Helsinki, Finland as well as in the Norwegian University of Science and Technology in Trondheim, Norway. He has authored over 100 publications, including in Nature and Science. For his contributions to theoretical and population ecology, he won the prestigious 2009 Academy of Finland Award, awarded to only two scientists nationally each year.
Monitoring plant and animal populations is an important goal for both academic research and management of natural resources. Successful management of populations often depends on obtaining estimates of their mean or total over a region. The basic problem considered in this paper is the estimation of a total from a sample of plots containing count data, but the plot placements are spatially irregular and non-randomized. Our application had counts from thousands of irregularly spaced aerial photo images. We used change-of-support methods to model counts in images as a realization of an inhomogeneous Poisson process that used spatial basis functions to model the spatial intensity surface. The method was very fast and took only a few seconds for thousands of images. The fitted intensity surface was integrated to provide an estimate from all unsampled areas, which is added to the observed counts. The proposed method also provides a finite area correction factor to variance estimation. The intensity surface from an inhomogeneous Poisson process tends to be too smooth for locally clustered points, typical of animal distributions, so we introduce several new overdispersion estimators due to poor performance of the classic one. We used simulated data to examine estimation bias and to investigate several variance estimators with overdispersion. A real example is given of harbor seal counts from aerial surveys in an Alaskan glacial fjord.
Bio: Jay Ver Hoef a statistician for the National Marine Mammal Lab of the National Oceanic and Atmospheric Association (NOAA), U.S. Dept. of Commerce. He obtained a Ph.D. at Iowa State University in Statistics in 1991. Jay develops statistical methods and consults on a wide variety of topics related to marine mammals and stream networks. The Marine Mammal Lab is located in Seattle, Washington, although Jay lives in Fairbanks, Alaska. His main statistical interests are in spatial statistics and Bayesian statistics, especially applied to ecological and environmental data. Jay is a fellow of the American Statistical Association.
For the best part of four decades, multivariate analysis in ecology has diverged substantially from mainstream statistics, perhaps because state-of-the-art in 1980's statistics was not capable of handling the complexity frequently seen in multivariate abundance data simultaneously collected across many species. But the methods developed in the ecological literature, still widely used today, have some serious shortcomings that suggest they are fast approaching their use-by date. The statistical literature appears to be "catching up" with ecology, in part through technologies to fit quite flexible hierarchical models capable of accommodating key date structure. There is a significant movement now to reunify multivariate analysis in ecology with modern statistical practices. Some key developments on this front will be reviewed, and immediate challenges identified.
Bio: David trained jointly between ecology and statistics to postgraduate level (at Sydney and Macquarie), and his research since then has been at the interface of these two disciplines, addressing what he considers to be a significant knowledge gap. He has organised the Eco-Stats conference (this December at UNSW), and its previous iteration in 2013, to help create connections across these disciplines. His research ranges from applied to theoretical - translating modern statistical tools to ecology, and developing new statistical methodology motivated by ecological problems, including contributions to high-dimensional data analysis, resampling, point process modelling and model selection.